UNIVERSITY  OF 

ILLINOIS  LIBRARY 

AT  URBANA-CHAMPAIGN 

BOOKSTACKS 


Faculty  Working  Paper  92-0128 


330  ST" 

7  7ZSl^O      LUhf        JL 


Forecasting  Exchange  Rates  Using 
Feedforward  and  Recurrent  Neural  Networks 


UrtversrtyofHanote 


Chung-Ming  Kuan  Tung  Liu 

Department  of  Economics  Department  of  Economics 

University  of  Illinois  Ball  State  University 


Bureau  of  Economic  and  Business  Research 

College  of  Commerce  and  Business  Administration 

University  of  Illinois  at  Urbana-Champaign 


BEBR 


FACULTY  WORKING  PAPER  NO.  92-0128 

College  of  Commerce  and  Business  Administration 

University  of  Illinois  at  Urbana-Champaign 

May  1992 


Forecasting  Exchange  Rates  Using 
Feedforward  and  Recurrent  Neural  Networks 


Chung-Ming  Kuan 
Department  of  Economics 


Tung  Liu 
Department  of  Economics 


Digitized  by  the  Internet  Archive 

in  2012  with  funding  from 

University  of  Illinois  Urbana-Champaign 


http://www.archive.org/details/forecastingexcha92128kuan 


FORECASTING  EXCHANGE  RATES  USING 

FEEDFORWARD  AND  RECURRENT 

NEURAL  NETWORKS 

Chung-Ming  Kuan 

Department  of  Economics 
University  of  Illinois  at  Urbana-Champaign 

and 

Tung  Liu 

Department  of  Economics 
Ball  State  University 

May  15.  1992 


t  We  would  like  to  thank  Roger  Koenker,  Bill  Maloney  and  Paul  Newbold  for  useful  discussions.  We  are 
most  grateful  to  Richard  Baillie  for  providing  us  the  data  set  and  Hal  White  for  permitting  us  to  access 
his  NEUTON  program,  which  helped  us  to  improve  our  program  significantly.  C.-M.  Kuan  also  thanks 
the  Research  Board  of  the  University  of  Illinois  for  partial  research  support. 


Abstract 


In  this  paper  we  investigate  the  forecasting  ability  of  feedforward  and  recurrent  networks 
based  on  empirical  foreign  exchange  rate  data.  A  two-step  procedure  is  proposed  to  con- 
struct suitable  networks,  in  which  networks  are  selected  based  on  the  predictive  stochastic 
complexity  (PSC)  criterion.  We  find  that  PSC  is  a  sensible  criterion  in  selecting  networks 
and  that  neural  networks  perform  reasonably  well  in  terms  of  out-of-sample  MSE  and  sign 
predictions.  In  particular,  the  networks  selected  based  on  PSC  have  rather  satisfactory 
sign  prediction  results  and  compare  favorably  with  the  ARMA  models  selected  based  on 
the  SIC  criterion. 


1      Introduction 

Neural  network  is  a  general  class  of  nonlinear  models  which  has  been  successfully  applied  in 
many  different  fields.  Numerous  empirical  and  computational  applications  can  be  found  in 
the  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks  and  Conference 
of  Neural  Information  Processing  Systems.  In  spite  of  its  success  in  various  fields,  there 
are  only  a  few  applications  of  neural  networks  in  economics.  Neural  networks  are  novel 
in  econometric  applications  in  the  following  two  respects.  First,  the  class  of  multi-layer 
neural  networks  can  well  approximate  a  large  class  of  functions  (Hornik,  Stinchcombe, 
and  White  (1989)  and  Cybenko  (1989)),  whereas  most  of  commonly  used  nonlinear  time 
series  models  do  not  have  this  property.  Second,  the  approximation  capability  of  neural 
networks  requires  only  that  the  number  of  parameters  grow  linearly  (Barron  (1991)). 
This  is  in  contrast  to  polynomial,  spline,  and  trigonometric  expansions  which  require  the 
number  of  parameters  to  grow  exponentially  to  achieve  the  same  approximation  rate. 
Thus,  if  the  behavior  of  economic  variables  exhibits  nonlinearity,  a  suitably  constructed 
neural  network  can  serve  as  a  useful  tool  to  capture  such  regularity. 

In  this  paper  we  investigate  possible  nonlinear  patterns  in  foreign  exchange  data  using 
feedforward  and  recurrent  networks.  It  has  been  widely  accepted  that  foreign  exchange 
rates  are  1(1)  (integrated  of  order  one)  processes  and  that  changes  of  exchange  rates  are 
uncorrelated  over  time.  Hence,  exchange  rates  are  not  predictable  in  general.  For  a 
comprehensive  review  in  these  issues  we  refer  to  Baillie  and  McMahon  (1989).  Since  the 
empirical  studies  supporting  these  conclusions  rely  mainly  on  linear  time  series  techniques, 
it  is  not  unreasonable  to  conjecture  that  the  linear  unpredictability  of  exchange  rates 
may  be  due  to  limitations  of  linear  models.  Hsieh  (1989)  finds  that  changes  of  exchange 
rates  may  be  nonlinearly  dependent,  even  though  they  are  linearly  uncorrelated.  Some 
researchers  also  give  evidence  in  favor  of  nonlinear  forecasts,  e.g.,  Taylor  (1980,1982), 
Engel  and  Hamilton  (1990),  Engel  (1991),  and  Chinn  (1991).  On  the  other  hand,  Diebold 
and  Nason  (1990)  find  that  nonlinearities  of  exchange  rates,  if  any,  cannot  be  exploited  to 
improve  forecasting.  Therefore,  we  focus  on  whether  neural  networks  can  provide  superior 
out-of-sample  forecasts. 

In  our  application,  a  two-step  procedure  for  network  construction  is  proposed.  In  the 
first  step,  we  compute  the  so-called  "predictive  stochastic  complexity"  (Rissanen  (1987)) 
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using  a  computationally  efficient,  recursive  estimation  method,  from  which  we  can  select 
suitable  networks.  In  the  second  step,  the  recursive  estimates  are  "smoothed"  to  improve 
statistical  efficiency.  Our  procedure  differs  from  previous  applications  of  feedforward  net- 
work in  economics,  e.g.,  White  (1988)  and  Kuan  and  White  (1990),  in  that  networks  are 
selected  objectively.  Also,  the  application  of  recurrent  network  appears  to  be  new  in  eco- 
nomics; as  a  recurrent  network  can  be  viewed  as  a  model  with  dynamic  latent  variables, 
its  performance  relative  to  feedforward  network  should  also  be  of  interest  to  researchers. 
Our  results  show  that  predictive  stochastic  complexity  is  a  sensible  criterion  in  selecting 
networks  and  that  neural  networks  perform  reasonably  well  in  terms  of  out-of-sample  MSE 
and  sign  predictions.  In  particular,  the  networks  obtained  from  the  proposed  procedure 
yield  rather  satisfactory  sign  prediction  results,  especially  for  the  Japanese  Yen,  Deutsche 
Mark,  and  Swiss  Franc  series,  and  compares  favorably  with  ARMA  models. 

This  paper  proceeds  as  follows.  We  review  various  network  architectures  and  estima- 
tion methods  in  section  2.  The  network  construction  procedures  are  described  in  section  3. 
Empirical  results  are  analyzed  in  section  4.  Section  5  concludes  the  paper.  We  summarizes 
a  "recurrent  Newton"  algorithm  in  the  Appendix. 

2      Feedforward  and  Recurrent  Networks 

In  this  section  we  briefly  describe  feedforward  and  recurrent  networks  and  associated 
estimation  methods.  For  more  details  see  Kuan  and  White  (1991a). 

2.1      Network  Architectures 

A  typical  single-output,  feedforward  neural  network  consists  of  an  input  layer  with  n  input 
units,  a  hidden  layer  with  q  hidden  units,  and  an  output  layer  with  an  output  unit.  Let 
x  be  an  n-vector  of  input  variables.  The  input  variables  first  simultaneously  activate  q 
hidden  units  through  some  function  $,  and  the  hidden  unit  activations  h{,i  =  1,- ••,<?, 
then  activate  output  units  through  some  function  $  to  produce  the  network  output  o. 
Symbolically,  we  have 

n 


o    =    $(/?o  +  X>M-  C1) 

!=1 

More  compactly,  we  can  write 

<7  n 

o     =      $  (#)  +  XI  A*(7io  +  £  7ii*j )) 

=:    f(x,9),  (2) 

where  0  is  the  vector  of  parameters  containing  all  /Ts  and  7's.  This  is  a  flexible  nonlinear 
functional  form  in  that  the  activation  functions  ^  and  $  can  be  chosen  quite  arbitrarily, 
except  that  $  is  generally  required  to  be  a  bounded  function.  Hornik,  Stinchcombe,  and 
White  (1989)  and  Cybenko  (1989)  show  that  the  function  /  in  (2)  can  approximate  a 
large  class  of  functions  arbitrarily  well  (in  a  suitable  metric),  provided  that  the  number  of 
hidden  units,  q,  is  sufficiently  large.  This  property  is  very  similar  to  that  of  nonparamet- 
ric  methods.  Barron  (1991)  also  shows  that,  to  achieve  the  same  approximation  rate,  a 
feedforward  network  uses  only  linearly  many  parameters  0{qn),  whereas  traditional  poly- 
nomial, spline,  and  trigonometric  expansions  use  exponentially  many  parameters  0{qn). 
These  two  properties  make  feedforward  networks  an  attractive  econometric  tool  in  (non- 
parametric)  applications. 

However,  the  inputs  x  included  in  a  feedforward  network  may  not  be  sufficient  to 
characterize  the  behavior  of  targets  in  some  applications.  In  view  of  this  deficiency,  various 
networks  allowing  feedbacks  have  been  proposed.  In  particular,  we  consider  the  following 
recurrent  network  due  to  Elman  (1988): 

n  q 

hi,t    =    ¥(7to  +  $37«jxj,t  +  5^tf&*,«-i)>      » =  l,---,g, 

j=i  £=1 

ot    =    *(A>  +  £#M-  (3) 

t=i 

Here,  the  hidden-unit  activations  /i,  feed  back  to  the  input  layer  with  delay  and  serve  to 
"memorize"  the  past  information,  cf.  (1).  Note  that  we  have  added  the  time  index  t  to  (3) 
to  indicate  the  feedback  (time  delay)  effect.  It  is  straightforward  to  see  that  by  recursive 
substitution, 

q  n  q 

ot     =     $  [pQ  +  Y,  W7*o  +  Yl  7W,<  +  Y  6uht,t-i )) 

1=1  j=i  e=\ 


<7 


£=1  ifc=l  m  =  l 

=:     0(*',0),  (4) 

where  xl  —  (xt,  xt-\,  •  •  • ,  a'i )  and  0  is  the  vector  of  parameters  containing  all  /i's,  7's,  and 
#'s.  In  contrast  with  (2),  the  network  output  ot  is  a  function  of  xt  and  its  entire  history. 
We  thus  expect  that  a  recurrent  network  may  capture  more  dynamic  characteristics  of  yt 
than  does  a  feedforward  network. 

2.2      Estimation  Methods 

Given  a  target  variable  y  and  a  feedforward  network  (2),  we  want  to  find  suitable  param- 
eters 9"  minimizing 

E\y  -  f(x,9)\2  =  E\y  -  E(y\x)\2  +  E\E(y\x)  -  f(x,9)\2.  (5) 

This  is  equivalent  to  minimizing  E\E(y\x)  —  f(x,9)\2.  That  is,  we  want  to  use  feedforward 
network  to  approximate  the  unknown  conditional  mean  function.  Since  E(y\x)  is  the  best 
L2-predictor  of  y  given  x,  the  network  output  f{x,9m)  should  match  the  target  variables 
fairly  closely,  at  least  in  the  Li  sense.  In  view  of  (5),  the  unknown  parameters  can  be 
estimated  using  the  method  of  Nonlinear  Least  Squares  (NLS).  Alternatively,  recursive  es- 
timation methods  may  be  used.  Although  recursive  estimation  is  important  for  adaptive 
learning  and  on-line  signal  processing,  it  is  well  known  that  recursive  algorithms  do  not 
utilize  the  data  efficiently  in  finite  samples.  However,  recursive  estimation  can  provide  use- 
ful starting  values  for  the  NLS  estimator  and  facilitate  network  selection  (see  discussions 
in  Section  3).  Specifically,  we  consider  the  following  stochastic  Newton  algorithm: 

0t+i     =     9t  +  vtGTlVf(xu9t)[yt-  f(xtjt)], 
Gt+i     =     Gt  +  Vt[Vf(xt,9t)Vf(xtJt)'-Gt],  (6) 

where  Vf(x,9)  is  the  (column)  gradient  vector  of  /  with  respect  to  9  and  {r]t}  is  a  sequence 
of  learning  rates  of  order  \/t.    Kuan  and  White  (1991a)  show  that  the  estimates  of  the 


algorithm  (6)  are  root-T  consistent  and  asymptotically  equivalent  to  the  NLS  estimator 
under  very  general  conditions.  In  practice,  an  algebraically  equivalent  form  of  (6)  can 
be  employed  to  avoid  matrix  inversion  in  the  algorithm;  see  Kuan  and  White  (1991a)  for 
details. 

Similarly,  the  parameters  of  interest  of  a  recurrent  network  are  9"  that  minimize 

E\yt-g{x\9)\\ 

and  <7(xf  ,<?*)  can  be  viewed  as  an  approximation  of  Eiy^x1).  However,  estimation  of  a 
recurrent  network  is  not  straightforward.  In  view  of  (4),  the  network  output  o  depends  on 
6  directly  and  indirectly  through  the  presence  of  lagged  hidden-unit  activations.  Hence 
g  is  a  very  complex  function  of  9.  In  particular,  in  calculating  the  derivatives  of  g  with 
respect  to  9,  parameter  dependence  of  feedbacks  /i,,j_i  must  be  taken  into  account.  Ow- 
ing to  this  "state  dependent"  structure,  the  method  of  NLS  becomes  infeasible  and  the 
algorithm  (6)  is  invalid.  In  our  applications,  a  "recurrent  Newton"  algorithm  analogous  to 
(6)  is  adopted.  Kuan  and  Liu  (1992)  show  that  this  algorithm  is  strongly  consistent,  pro- 
vided that  recurrent  connections  <5's  are  constrained  suitably,  and  is  computationally  more 
efficient  than  the  "recurrent  back-propagation"  algorithm  proposed  in  Kuan,  Hornik,  and 
White  (1991).  To  avoid  introducing  excessive  notations  here,  the  details  of  this  algorithm 
are  deferred  to  the  Appendix.  The  working  papers  cited  above  are  available  upon  request 
from  the  first  author. 

3      Network  Construction 

In  this  paper,  we  choose  the  activation  functions  $  as  the  logistic  function  and  $  as  the 
identity  function.  These  choices  are  quite  standard  in  neural  network  literature.  We  adopt 
the  following  two-step  procedure  to  estimate  networks. 

1.  Perform  recursive  estimation  using  the  stochastic  Newton  algorithm  (6)  or  its  recur- 
rent counterpart  described  in  the  Appendix. 

•  We  generate  10  sets  of  parameters  and  choose  the  one  with  the  lowest  mean 
squared  error  (MSE)  as  the  initial  values  for  recursive  algorithms. 

•  We  let  the  algorithm  run  through  the  data  set  10  times;  the  final  estimates 
from  each  pass  of  the  data  are  used  as  the  initial  values  of  the  next  pass. 


2.  Perform  NLS  estimation  using  FORTRAN  subroutine  MINPACK. 

•  For  feedforward  network,  the  final  recursive  estimates  from  the  last  pass  of  the 
data  are  used  as  initial  values  of  the  NLS  estimator  for  9. 

•  For  recurrent  network,  we  fix  the  recurrent  connection  6's  at  the  final  recursive 
estimates  and  use  the  final  estimates  as  initial  values  of  the  NLS  estimator  for 
forward  connections  /Ts  and  7's. 

From  our  experience,  performing  recursive  estimation  more  than  5  times  yields  quite  stable 
results.  In  "smoothing"  the  estimates  for  recurrent  network,  the  parameters  6's  are  fixed 
to  avoid  constraint  minimization.  (Recall  that  <S's  must  be  constrained  suitably  to  ensure 
proper  convergence  behavior.)  Hence,  the  second  step  for  recurrent  network  is  analogous 
to  building  a  partially  hard-wired  recurrent  network  (Kuan  and  Hornik  (1991)). 

The  more  difficult  problem  is  to  determine  network  complexity.  A  very  simple  network 
may  not  be  able  to  approximate  the  unknown  conditional  mean  function  well;  an  exces- 
sively complex  network  may  over  fit  the  data.  There  is,  however,  no  definite  conclusion 
regarding  the  determination  of  network  complexity.  One  possible  criterion  is  the  Schwarz 
(1978)  Information  Criterion  (SIC).  Rissanen  (1983,1984)  shows  that  this  criterion  can  be 
applied  to  a  more  general  setting  than  linear  models;  in  particular,  the  SIC  is  asymptot- 
ically equivalent  to  stochastic  complexity  of  a  model  (Rissanen  (1987)).  When  the  SIC  is 
applied  to  determine  the  order  of  an  ARMA  model,  it  is  also  known  that  the  SIC  is  di- 
mensionally  consistent  (Hannan  (1980)).  Note,  however,  that  selecting  networks  based  on 
SIC  is  computationally  demanding  because  NLS  is  required  for  estimating  every  possible 
network. 

An  alternative  criterion  to  regularize  network  complexity  is  the  "Predictive  Stochastic 
Complexity"  (PSC)  criterion  due  to  Rissanen  ( 1986a, b);  see  also  Rissanen  (1987).  Given 
a  function  h(x,9),  where  9  is  a  fc-dimensional  parameter  vector,  and  a  sample  of  T  obser- 
vations, PSC  is  computed  from  honest  prediction  errors  as 
T 

£  (» -&(*«, $t-i))2/cr-*),  (") 

where  9t_\  is  the  parameter  estimate  obtained  using  the  data  up  to  time  t  —  1.  The 
prediction  error  yt  —  h(xt,9t-i)  is  "honest"  in  the  sense  that  no  information  at  time  t 


or  beyond  is  used  to  calculate  9t-\.  A  particular  model  is  selected  if  it  has  the  smallest 
PSC  within  a  class  of  models.  If  two  models  have  the  same  PSC,  the  simpler  one  is 
selected.  Clearly,  the  PSC  criterion  is  based  on  forward  validation,  which  is  particularly 
important  in  forecasting.  Rissanen  also  shows  that  for  encoding  a  sequence  of  numbers, 
the  PSC  criterion  can  determine  the  code  with  the  shortest  code  length  asymptotically. 
For  a  thorough  discussion  of  the  notion  of  stochastic  complexity  we  refer  to  Rissanen 
(1989).  This  criterion  has  also  been  applied  to  determine  the  order  of  ARMA  models,  e.g., 
Gerencser  (1990),  Hemerly  and  Davis  (1989),  and  Hannan,  McDougall,  and  Poskitt  (1989). 
Obviously,  calculation  of  PSC  is  also  computationally  demanding  if  NLS  is  required  to 
estimate  0t  at  each  t.  Following  the  idea  of  Gerencser  and  Rissanen  (1991),  we  can  compute 
9t  using  the  recursive  estimation  method,  which  is  more  tractable  computationally.  In  our 
two-step  procedure,  PSC  can  be  computed  easily  in  the  first  step;  specifically,  PSC  is 
computed  from  the  last  pass  of  the  data  in  recursive  estimation. 

4      Empirical  Results 

In  this  paper  five  exchange  rates,  including  Canadian  Dollar  (CD),  Deutsche  Mark  (DM), 
Japanese  Yen  (JY),  Pound  Sterling  (PS),  and  Swiss  Franc  (SF),  are  investigated.  The 
data  are  daily  opening  bid  prices  of  NY  Foreign  Exchange  Market  from  March  1,  1980  to 
January  28,  1985,  consisting  of  1245  observations.  All  series  except  PS  are  US  dollars  per 
unit  of  foreign  currency.  This  data  set  has  also  been  used  in  Baillie  and  Bollerslev  ( 1989). 

Let  5t-,£  denote  the  i-th  exchange  rate  at  time  t,  and  yiit  —  log  Sui  —  log  Sut-\,  i  = 
CD,  DM,  JY,  PS,  SF.  By  applying  various  unit-root  tests  of  Phillips  (1987),  Phillips  and 
Perron  (1988),  and  Perron  (1988),  Baillie  and  Bollerslev  (1989)  find  that  log  St,t  are  unit 
root  processes  without  drift  and  that  changes  of  log  exchange  rates  behave  like  a  martingale 
difference  sequence.  In  addition,  we  estimate  36  ARMA  models  from  ARMA(0,0)  to 
ARMA(5,5)  on  y,it  and  evaluate  the  resulting  SIC  values.  These  SIC  values,  which  are 
summarized  in  Table  1,  indicate  that  ARM A( 0,0)  is  the  best  model  for  all  five  series. 
As  the  SIC  is  dimensionally  consistent,  this  result  agrees  with  the  finding  of  Baillie  and 
Bollerslev. 

[  Table  1  About  Here  1 


To  construct  neural  networks,  we  follow  the  two  step  procedure  described  in  Section  3 
and  take  yijt  as  target  variables.  We  use  1194  observations  for  in-sample  estimation  and 
reserve  the  last  50  observations  for  out-of-sample  forecasting.  In  the  first  step,  36  feed- 
forward and  recurrent  networks  (with  1-6  lagged  targets  as  inputs  and  1-6  hidden  units) 
are  estimated  using  the  Newton  algorithms.  We  shall  write  the  network  with  L  lags  and 
H  hidden  units  as  the  network  (Z,i/).  For  each  series,  five  networks  with  best  PSC 
are  selected.  In  the  second  step,  the  parameter  estimates  of  the  selected  networks  are 
"smoothed"  using  NLS.  Table  2  contains  the  PSC  values  of  all  feedforward  and  recurrent 
networks.  To  save  space,  we  do  not  report  in-sample  MSE  here.  It  is  not  surprising  to 
note  that,  in  general,  in-sample  MSE  from  NLS  estimation  are  much  better  than  those 
from  recursive  estimation. 

[  Table  2  About  Here  ] 

It  is  typical  to  evaluate  forecasting  performance  based  on  out-of-sample  MSE.  Another 
important  criterion  is  to  compare  out-of-sample  sign  predictions  of  different  models.  Sign 
prediction  provides  forecasts  of  the  direction  of  future  changes,  hence  gives  important 
information  in  financial  forecasting.  In  an  extreme  case,  a  model  could  have  small  out-of- 
sample  MSE  but  predict  all  the  signs  incorrectly,  hence  is  virtually  useless.  We  summarize 
out-of-sample  MSE  and  percentage  of  correct  sign  predictions  of  the  selected  networks  in 
Table  3.  As  a  comparison,  out-of-sample  forecasting  results  from  five  ARM  A  models, 
including  ARMA(0,0)  which  is  the  best  model  based  on  the  SIC,  are  also  included.  It 
can  be  seen  that  the  PSC  criterion  selects  a  wide  variety  of  networks  for  each  series. 
Note,  however,  that  the  PSC  criterion  tends  to  select  more  complex  networks;  most  of  the 
selected  networks  contain  3-6  hidden  units.  From  Table  3  we  also  observe  the  following. 

1.  Out-of-sample  MSE: 

(a)  The  selected  feedforward  and  recurrent  networks  do  not  dominate  each  other, 
and  a  better  network  (i.e.,  a  network  with  smaller  PSC)  need  not  have  better 
out-of-sample  MSE. 

(b)  For  the  DM  and  JY,  four  out  of  five  selected  feedforward  networks  perform 
better  than  all  ARMA  models;  for  the  SF,  three  out  of  five  selected  feedforward 
networks  perform  better  than  all  ARMA  models. 
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(c)  For  the  JY  and  SF,  four  out  of  five  selected  recurrent  networks  perform  better 
than  all  ARM  A  models. 

(d)  For  the  CD,  ARMA  models  perform  better  than  network  models. 

(e)  The  best  feedforward  network  performs  better  than  ARMA(0,0)  for  all  five 
series,  and  the  best  recurrent  network  performs  better  than  ARMA(O.O)  for  the 
DM,  JY,  and  SF. 

2.  Out-of-sample  sign  predictions: 

(a)  Correct  sign  predictions  of  ARMA  models  fluctuate  around  50%,  except  for  the 
PS,  which  are  close  to  60%. 

(b)  For  the  DM,  JY,  and  SF,  the  selected  feedforward  networks  perform  better  than 
ARMA  models  and  usually  have  more  than  60%  correct  sign  predictions.  In 
particular,  for  the  JY,  all  five  selected  feedforward  networks  have  correct  sign 
predictions  more  than  60%,  and  two  of  them  have  66%  correct;  for  the  SF,  four 
best  feedforward  networks  have  correct  sign  predictions  more  than  60%. 

(c)  For  the  DM,  JY,  PS,  and  SF,  the  selected  recurrent  networks  perform  better 
than  ARMA  models;  and  for  the  PS  and  SF,  three  out  of  five  selected  recurrent 
networks  have  more  than  60%  correct  sign  predictions. 

(d)  For  all  series  except  the  CD,  almost  all  the  selected  networks  have  more  than 
50%)  correct  sign  predictions. 

[  Table  3  About  Here  ] 

The  results  above  suggest  that  the  PSC  criterion  is  a  quite  sensible  criterion.  The 
best  network  selected  based  on  PSC  has  good  out-of-sample  performance  and  compares 
favorably  with  the  best  ARMA  model  selected  based  on  the  SIC.  As  far  as  out-of-sample 
MSE  being  concerned,  the  selected  networks  seem  to  perform  well  for  the  DM.  JY,  and  SF. 
but  their  performance  does  not  dominate  ARMA  models  significantly.  In  terms  of  out-of- 
sample  sign  predictions,  while  ARMA  models  usually  perform  no  better  than  tossing  a  coin 
(i.e.,  50%  chance  being  correct),  network  models  have  rather  satisfactory  predicting  ability. 
This  is  especially  true  for  the  DM,  JY,  and  SF  series.  It  also  appears  that  feedforward 
networks  have  more  stable  sign  prediction  results  than  recurrent  networks.  It  is  somewhat 


surprising  to  us  that  recurrent  networks  do  not  perform  as  good  as  feedforward  networks. 
One  possible  interpretation  is  that  the  feedback  structure  in  recurrent  networks  cannot  be 
very  effective  if  there  is  very  little  correlation  across  target  variables. 

To  obtain  a  complete  picture  of  the  performance  of  feedforward  and  recurrent  networks, 
we  "smoothed"  all  other  networks  not  selected  by  the  PSC  criterion.  By  inspecting  the 
resulting  SIC  values,  we  find  that  the  SIC  criterion  almost  always  selects  the  simplest 
network  (1,1).  This  is  true  for  both  feedforward  and  recurrent  networks.  (We  do  not  give 
a  detailed  table  of  the  SIC  values  here.)  Note  that  the  SIC  penalizes  a  model  in  terms  of 
the  number  of  parameters.  Thus,  the  SIC  of  the  network  (2,2)  has  the  same  complexity 
penalty  as  the  SIC  of  the  network  (6,1),  but  clearly,  the  nonlinear  structures  of  these  two 
networks  are  very  different.  The  out-of-sample  MSE  and  sign  predictions  results  of  all 
36  networks  are  collected  in  Tables  4  and  5.  We  compare  these  results  with  four  ARMA 
models  used  in  Table  3  and  give  a  summary  in  Table  6.  As  ARMA(1,0)  and  ARMA(O.l) 
perform  very  similarly,  comparison  with  ARMA(0,1)  is  not  included  in  Table  6. 

Tables  4  and  5  show  that  feedforward  and  recurrent  networks  perform  similarly;  in 
particular,  networks  models  have  quite  satisfactory  sign  prediction  results  for  the  DM, 
JY,  PS,  and  SF  series.  We  also  observe  that  a  more  complex  network  need  not  predict 
better  than  a  simpler  network  and  that  no  network  with  certain  number  of  hidden  units 
can  systematically  beat  other  networks  with  different  number  of  hidden  units.  In  view  of 
Table  6,  we  can  see  that  for  three  exchange  rates  (DM.  JY,  and  SF),  both  feedforward  and 
recurrent  networks  usually  perform  better  than  ARMA  models  in  terms  of  out-of-sample 
MSE.  Again,  the  out-of-sample  MSE  of  network  models  do  not  significantly  dominate 
those  of  ARMA  models.  For  all  five  exchange  rates,  network  models  have  much  better 
sign  prediction  results  than  ARMA  models.  This  is  compatible  with  previous  prediction 
results  based  on  selected  networks.  We  have  also  estimated  networks  without  the  bias 
term  (3q  to  see  whether  sign  predictions  can  be  improved.  However,  the  estimation  results 
turn  out  to  be  unstable;  many  parameter  estimates  tend  to  be  extremely  large  or  small 
with  huge  variances. 

[  Tables  4-6  About  Here  ] 
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5      Conclusions 

In  this  paper  we  have  carefully  estimated  feedforward  and  recurrent  networks  to  fore- 
cast changes  of  log  exchange  rates.  We  find  that  PSC  is  a  sensible  criterion  in  selecting 
networks.  Based  on  this  criterion,  it  is  possible  to  construct  a  network  with  better  out- 
of-sample  MSE  and/or  sign  prediction  than  ARMA  models.  Therefore,  the  proposed 
two-step  procedure  may  be  used  as  a  standard  network  construction  procedure  in  other 
applications.  As  far  as  out-of-sample  MSE  being  concerned,  we  share  the  same  conclusion 
with  Diebold  and  Nason  (1990)  that  nonlinearities  of  exchange  rates,  if  any,  may  not  be 
exploited  to  improve  point  prediction.  On  the  other  hand,  if  we  are  not  so  ambitious 
about  point  forecasts  and  confine  ourselves  to  sign  predictions,  our  results  also  suggest 
that  network  models  perform  quite  well  for  this  purpose.  In  particular,  it  usually  per- 
forms better  than  ARMA  models  and  coin  tossing.  Finally,  different  exchange  rates  have 
different  behavior  and  characteristics.  In  our  application,  network  models  do  not  predict 
well  for  the  CD  but  perform  quite  well  for  the  DM,  JY,  and  SF  series.  It  also  appears 
that  feedforward  networks  perform  slightly  better  than  recurrent  networks  and  have  more 
stable  prediction  results. 
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Appendix 

A  Summary  of  Recurrent  Newton  Algorithm: 

We  write  a  recurrent  network  as 

q  n  q 

ot     =     $  (fa  +  J2  &  *  ( 7io  +  Y,  T»j  xJ<t  +  Y 6"  ht,t-i)) 

q 
=:     *(ft  +  £ft&(**,J»t-i,0)) 

1=1 

=  :     <f>(xt,ht-i,0), 

where  9  includes  all  /J's,  7's,  and  <!>'s.  Note  that  /if_ !  is  also  a  function  of  9.  The  recurrent 
Newton  algorithm  contains  the  following  updating  equations: 

et  =  yt  ~  <!>(xt,ht-iJt), 

Vet  =  -<f>e(xt,ht-iJt)'  -  &t<Ph(xtJh-iJt)', 

0~t+i  =  9t  ~  r]tG^lVetet, 

Gt+i  =  Gt  +  T)t(VetVet  -  Gt), 

where  the  i-th  hidden  unit  is  updated  according  to 
hij     =     ii>i(xt,ht-i,0t) 

n  q 

=     (7io,<  +  Y  yiUxJ,t  +  Y  *«,<^,*-i)»       *  =  !' - ' "  '9' 

J=l  ^=1 

the  j'-th  column  of  /\t+\  is  updated  according  to 

Aht+i     =    il>j,e(xt,ht-i,0ty  +  A>ti>j,h(xt,ht-iJt)',      J  =  l»---»9> 

and  the  initial  values  #o,  ^0,  and  ^0  are  chosen  arbitrarily.  Here,  0#  and  ^  are  (row) 
vectors  of  the  first  order  derivatives  of  0  with  respect  to  9  and  h,  respectively,  and  tjjtjQ 
and  x^iyh  are  (row)  vectors  of  the  first  order  derivatives  of  the  i-th  hidden  unit  tpi  with 
respect  to  9  and  /i,  respectively. 

This  algorithm  differs  from  the  recurrent  back-propagation  algorithm  of  Kuan,  Hornik, 
and  White  (1991)  in  that  a  Newton  direction  G',-1  is  added  in  the  updating  equation  of  9t. 
Note  that  in  this  algorithm  the  derivatives  of  prediction  error  e  with  respect  to  9  contains 
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two  parts  because  e  depends  on  0  directly  and  indirectly  through  the  presence  of  lagged 
hidden-unit  activations,  i.e., 

Vet     =     -<f>0(xt,ht-i,0t) jg-<t>h{xt,ht-ii0t)  ■ 

Hence,  the  updating  equation  for  At  allows  us  to  update  the  dht^i/d9  term  recursively. 
Clearly,  a  recurrent  network  not  depend  on  ht-\  is  a  feedforward  network.  In  this  case,  <p^ 
term  is  zero,  and  there  is  no  need  to  consider  At  term.  The  recurrent  Newton  algorithm 
simply  reduces  to  the  standard  Newton  algorithm  ((3).  For  more  details  see  Kuan  and 
Liu  (1992). 
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Table  1.  The  SIC  Values  of  ARMA  Models  for  Changes  of  log  Exchange  Rates. 


ARMA 

Models 

SIC  Values 

CD 

DM 

JY 

PS 

SF 

(0,0) 

-11.9439 

-9.8901 

-9.9605 

-10.0422 

-9.7273 

(0,1) 

-11.9392 

-9.8850 

-9.9588 

-10.0372 

-9.7222 

(0,2) 

-11.9333 

-9.8810 

-9.9529 

-10.0316 

-9.7201 

(0,3) 

-11.9276 

-9.8762 

-9.9476 

-10.0273 

-9.7142 

(0,4) 

-11.9228 

-9.8704 

-9.9461 

-10.0233 

-9.7084 

(0,5) 

-11.9182 

-9.8648 

-9.9421 

-10.0174 

-9.7039 

(1,0) 

-11.9389 

-9.8843 

-9.9596 

-10.0385 

-9.7214 

(1,1) 

-11.9335 

-9.8786 

-9.9537 

-10.0333 

-9.7160 

(1,2) 

-11.9338 

-9.8759 

-9.9478 

-10.0278 

-9.7134 

(1,3) 

-11.9316 

-9.8697 

-9.9446 

-10.0249 

-9.7076 

(1,4) 

-11.9294 

-9.8637 

-9.9412 

-10.0191 

-9.7019 

(1,5) 

-11.9122 

-9.8580 

-9.9372 

-10.0139 

-9.6971 

(2,0) 

-11.9340 

-9.8792 

-9.9531 

-10.0327 

-9.7184 

(2,1) 

-11.9317 

-9.8742 

-9.9472 

-10.0268 

-9.7126 

(2,2) 

-11.9236 

-9.8695 

-9.9417 

-10.0222 

-9.7074 

(2,3) 

-11.9275 

-9.8635 

-9.9410 

-10.0183 

-9.7017 

(2,4) 

-11.9227 

-9.8573 

-9.9415 

-10.0127 

-9.6953 

(2,5) 

-11.9089 

-9.8517 

-9.9368 

-10.0070 

-9.6916 

(3,0) 

-11.9293 

-9.8745 

-9.9474 

-10.0284 

-9.7122 

(3,1) 

-11.9242 

-9.8688 

-9.9419 

-10.0230 

-9.7063 

(3,2) 

-11.9197 

-9.8632 

-9.9361 

-10.0169 

-9.7005 

(3,3) 

-11.9147 

-9.8588 

-9.9374 

-10.0115 

-9.6969 

(3,4) 

-11.9167 

-9.8526 

-9.9315 

-10.0099 

-9.6901 

(3,5) 

-11.9104 

-9.8453 

-9.9303 

-10.0043 

-9.6930 

(4,0) 

-11.9266 

-9.8678 

-9.9445 

-10.0242 

-9.7057 

(4,1) 

-11.9261 

-9.8619 

-9.9413 

-10.0183 

-9.6999 

(4,2) 

-11.9161 

-9.8563 

-9.9359 

-10.0130 

-9.6940 

(4,3) 

-11.9107 

-9.8501 

-9.9306 

-10.0094 

-9.6889 

(4,4) 

-11.9053 

-9.8454 

-9.9253 

-10.0033 

-9.6825 

(4,5) 

-11.9119 

-9.8402 

-9.9199 

-9.9974 

-9.6801 

(5,0) 

-11.9224 

-9.8616 

-9.9423 

-10.017 

-9.7002 

(5,1) 

-11.9205 

-9.8556 

-9.9363 

-10.011 

-9.6942 

(5,2) 

-11.9157 

-9.8501 

-9.9337 

-10.006 

-9.6897 

(5,3) 

-11.9057 

-9.8439 

-9.9293 

-10.003 

-9.6830 

(5,4) 

-11.9048 

-9.8380 

-9.9244 

-9.9974 

-9.6771 

(5,5) 

-11.8976 

-9.8328 

-9.9167 

-9.9929 

-9.6710 
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Table  2.  The  PSC  Values  of  Feedforward  and  Recurrent  Networks. 


Network 
Models 

PSC  Values 

Feedfor 

ward  Networks 

Recurrent  Net 

works 

CD 

DM 

JY 

PS 

SF 

CD 

DM 

JY 

PS 

SF 

(1.1) 

.6906 

.5580 

.5143 

.4872 

.6428 

.6936 

.5580 

.5172 

.4973 

.6430 

(2,1) 

.6923 

.5577 

.5028 

.4869 

.6426 

.6907 

.5574 

.5144 

.4868 

.6415 

(3,1) 

.6869 

.5590 

.5149 

.4863 

.6423 

.6913 

.5566 

.5159 

.4863 

.6411 

(4,1) 

.6905 

.5571 

.5173 

.4873 

.6407 

.6889 

.5586 

.5161 

.4871 

.6408 

(5,1) 

.6914 

.5530 

.5159 

.4877 

.6395 

.6901 

.5531 

.5170 

.4866 

.6425 

(6,1) 

1.200 

.5569 

.5146 

.4850 

.6400 

.6868 

.5579 

.5174 

.4844 

.6427 

(1,2) 

.6827 

.5577 

.5143 

.4873 

.6398 

.6931 

.5567 

.5174 

.4861 

.6426 

(2,2) 

.6906 

.5580 

.5028 

.4865 

.6425 

.6882 

.5589 

.5146 

.4863 

.6408 

(3,2) 

.6918 

.5563 

.5131 

.4864 

.6418 

.6904 

.5533 

.5148 

.4870 

.6403 

(4,2) 

.6852 

.5560 

.5126 

.4843 

.6410 

.6880 

.5575 

.5143 

.4843 

.6406 

(5,2) 

.6875 

.5544 

.5152 

.4827 

.6417 

.6826 

.5579 

.5166 

.4849 

.6421 

(6,2) 

.6893 

.5577 

.5137 

.4842 

.6360 

.6913 

.5570 

.5120 

.4847 

.6415 

(1,3) 

.6817 

.5567 

.5125 

.4906 

.6409 

.6905 

.5578 

.5052 

.4894 

.6433 

(2,3) 

.7125 

.5595 

.5084 

.4862 

.6396 

.7003 

.5571 

.5509 

.4849 

.6430 

(3,3) 

.6904 

.5571 

.5143 

.4848 

.6399 

.7888 

.5581 

.5167 

.4948 

.6373 

(4,3) 

.6885 

.5520 

.5123 

.4834 

.6402 

.6859 

.5556 

.5144 

.4853 

.6417 

(5,3) 

.6908 

.6847 

.5120 

.4821 

.6414 

.6889 

.5552 

.5205 

.4819 

.6411 

(6,3) 

.6855 

.5579 

.5136 

.4864 

.6428 

.6796 

.5544 

.5143 

.4828 

.6380 

(1,4) 

.7005 

.5555 

.4941 

.4884 

.6452 

.8418 

.5578 

.6200 

.5046 

.6712 

(2,4) 

.7144 

.5564 

.5095 

.4992 

.6349 

.7323 

.5539 

.4962 

.4886 

.6419 

(3,4) 

.6807 

.5550 

.5144 

.4853 

.6391 

.6885 

.5572 

.5130 

.4838 

.6405 

(4,4) 

.6834 

.5537 

.5153 

.4820 

.6421 

.6846 

.5561 

.5112 

.4805 

.6396 

(5,4) 

.6875 

.5545 

.5061 

.4838 

.6423 

.6839 

.5520 

.5107 

.4851 

.6400 

(6,4) 

.6909 

.5520 

.5051 

.4805 

.6348 

.6845 

.5485 

.5141 

.4805 

.6412 

(1,5) 

.6824 

.5617 

.4885 

.4867 

.6419 

.7093 

.5625 

.5094 

.5022 

.6457 

(2,5) 

.6853 

.5544 

.4960 

.4856 

.6431 

.6880 

.5532 

.5432 

.4908 

.6419 

(3,5) 

.7247 

.5520 

.5108 

.4835 

.6390 

.6846 

.5566 

.5028 

.4840 

.7249 

(4,5) 

.6874 

.5769 

.5050 

.4836 

.6628 

.1075 

.5544 

.5134 

.4810 

.6417 

(5,5) 

.6863 

.5559 

.5057 

.4836 

.6329 

.6850 

.5524 

.5112 

.4724 

.6320 

(6,5) 

.6830 

.5550 

.5118 

.4821 

.6401 

.6698 

.5457 

.5091 

.4786 

.6412 

(1,6) 

.6868 

.5701 

.4875 

.4868 

.6385 

.7016 

.5575 

.5819 

.4875 

.6950 

(2,6) 

.7125 

.5593 

.5091 

.4822 

.6389 

.7058 

.5505 

.5313 

.4819 

.6572 

(3,6) 

.6862 

.5543 

.4934 

.4896 

.6422 

.6738 

.5548 

.5079 

.4819 

.6328 

(4,6) 

.7251 

.5519 

.5139 

.4918 

.6378 

.6739 

.5544 

.4899 

.4861 

.6397 

(5,6) 

.6815 

.5528 

.5055 

.4773 

.6399 

.6800 

.5514 

.5089 

.4820 

.6403 

(6,6) 

.6810 

.5580 

.5055 

.4759 

.6252 

.6890 

.5532 

.5084 

.4795 

.6362 

Notes:  Network  model  (L,H)  is  the  network  with  L  lagged  targets  as  inputs  and  H  hidden  units. 
The  other  tables  follow  this  convention.  The  PSC  values  are  the  numbers  in  the  table  xlO-1, 
except  for  the  CD,  which  are  xlO-2. 
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Table  3.  Out-of-Sample  MSE  and  Sign  Predictions  of  the  Selected  Networks. 


Exchange 
Rates 

Feedforward  Networks 

Recurrent    Networks 

ARMA 

Selected 

MSE 

Sign 

Selected 

MSE 

Sign 

Models 

MSE 

Sign 

CD 

(3,4) 
(6,6) 
(5,6) 
(1,3) 
(1,5) 

.1896 
.2423 
.2730 
.1884 
.1875 

.52 
.44 
.36 
.54 
.58 

(6,5) 
(3,6) 
(4,6) 
(6,3) 
(5,6) 

.2350 
.1862 
.1924 
.2353 
.2196 

.42 
.50 
.54 
.42 
.38 

(0,0) 
(1,0) 
(0,1) 
(1,1) 
(2,2) 

.1906 
.1889 
.1888 
.1884 
.1895 

N/A 
.46 
.46 
.46 
.44 

DM 

(4,6) 
(6,4) 
(4,3) 
(3,5) 
(5,6) 

.1795 
.1962 
.1830 
.1899 
.2017 

.68 
.56 
.58 
.60 
.66 

(6,5) 
(6,4) 
(2,6) 
(5,6) 
(5,4) 

.2041 
.2416 
.1984 
.1873 
.2107 

.56 
.54 
.64 
.62 
.52 

(0,0) 
(1,0) 
(0,1) 
(1,1) 

(2,2) 

.2098 
.2098 
.2098 
.2096 
.2033 

N/A 
.48 
.48 
.46 
.54 

JY 

(1,6) 
(1,5) 

(3,6) 
(1-4) 
(2,5) 

.1160 
.1158 
.1205 
.1162 
.1127 

.66 
.62 
.60 
.62 
.66 

(4,6) 
(2,4) 
(3,5) 
(1,3) 
(3,6) 

.1168 
.1133 
.1220 
.1166 
.1122 

.58 
.60 
.58 
.60 
.58 

(0,0) 
(1,0) 
(0,1) 

(1,1) 
(2,2) 

.1225 
.1199 
.1201 
.1198 
.1202 

N/A 
.52 
.50 
.52 
.50 

PS 

(6,6) 
(5,6) 
(6,4) 
(4,4) 
(5,3) 

.3880 
.4204 
.3866 
.3929 
.3902 

.52 

.50 
.46 
.62 
.64 

(5,5) 
(6,5) 
(6,6) 
(4,4) 
(6,4) 

.4034 
.4295 
.4004 
.3895 
.3838 

.62 
.54 
.62 
.66 
.48 

(0,0) 
(1,0) 
(0,1) 

(1,1) 
(2,2) 

.3884 
.3893 
.3896 
.3915 
.3909 

N/A 
.60 
.60 
.56 
.58 

SF 

(6,6) 
(5,5) 
(6,4) 
(2,4) 
(6,2) 

.2129 
.1929 
.1897 
.1954 
.2146 

.62 

.64 
.64 
.62 
.50 

(5,5) 
(3,6) 
(6,6) 
(3,3) 
(6,3) 

.1796 
.1965 
.2490 
.1950 
.1958 

.62 
.64 
.50 
.66 
.54 

(0,0) 
(1,0) 
(0,1) 
(1,1) 
(2,2) 

.2157 
.2162 
.2162 
.2158 
.2124 

N/A 
.56 
.54 
.58 
.52 

Notes:  For  each  exchange  rate,  the  selected  networks  are  ordered  from  the  best  to  the  5-th  best, 
according  to  the  PSC  values  in  Table  2.  "MSE"  stands  for  out-of-sample  MSE;  "Sign"  stands  for 
the  percentage  of  correct  sign  predictions  of  corresponding  models.  MSE  are  the  numbers  in  the 
table  xlO-4,  except  for  the  CD,  which  are  xlO-5. 
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Table  4.  Out-of-Sample  MSE  of  Feedforward  and  Recurrent  Networks. 


Network 
Models 

Out-of-Sample  MSE 

Feedforward  Networks 

Recurrent  Networks 

CD 

DM 

JY 

PS 

SF 

CD 

DM 

JY 

PS 

SF 

(1,1) 

(2,1) 
(3,1) 
(4,1) 
(5,1) 
(6,1) 

1861 
1883 
1879 
1830 
1839 
1886 

1998 
1982 
1947 
1937 
1830 
2025 

1184 
1192 
1217 
1181 
1191 
1243 

.3681 
.3653 
.3785 
.3721 
.3697 
.3902 

2063 
1995 
1972 
2005 
2032 
1992 

.1881 
.1874 
.1887 
.1931 
.1879 
.2028 

1998 
1982 
1960 
1882 
1845 
2001 

1217 
1192 
1196 
1162 

1179 
1240 

.3704 
.3750 
.3735 
.3751 
.3888 
.3555 

.2098 
.1992 
.2003 
.2005 
.2008 
.1946 

(1,2) 

(2,2) 
(3,2) 
(4,2) 
(5,2) 
(6,2) 

1902 
1860 
1687 
1872 
2080 
2128 

1948 
1982 
2015 
1887 
2124 
1984 

1202 
1168 
1238 
1171 
1147 
1174 

.3681 
.3748 
.3913 
.4042 
.3812 
.3619 

1974 
1960 
1966 
1957 
2033 
2146 

.1881 
.1833 
.1973 
.1896 
.1819 
.1878 

1878 
1929 
1921 
1896 
1995 
1893 

1159 
1161 
1154 
1142 
1167 
1148 

.3496 
.3767 
.3800 
.3862 
.3857 
.3809 

.2058 
.2006 
.1989 
.1993 
.1952 
.2041 

(1,3) 

(2,3) 
(3,3) 
(4,3) 
(5,3) 
(6,3) 

1884 
1808 
1884 
1971 
1908 
2086 

1930 
2044 
1934 
1830 

1985 
2069 

1206 
1187 
1185 
1136 
1183 
1287 

.3580 
.3875 
.3842 
.3612 
.3902 
.3651 

1966 
1963 
2003 
2012 

2081 
2087 

.1873 
.1894 
.1878 
.1878 
.1890 
.2353 

1936 
1974 
1925 

1797 

1817 
1870 

1166 
1163 
1113 
1159 
1156 
1192 

.3543 
.3856 
.3791 
.3949 
.3927 
.3750 

.2057 
.1933 

.1950 
.1977 
.2127 
.1958 

(1,4) 

(2,4) 
(3,4) 
(4,4) 
(5,4) 
(6,4) 

1885 
1873 
1896 
1913 
2072 
2130 

1935 
1911 
1807 
1872 
1842 
1962 

1162 
1188 
1 146 
1089 
1090 
1080 

.3610 
.3936 
.3686 
.3929 
.4094 
.3866 

2044 
1954 
1910 
1990 
2038 
1897 

.1897 
.1871 
.1912 
.1840 
.2009 
.2138 

1915 
1892 
1822 
1981 
2107 
2416 

1146 
1133 
1101 
1066 
1022 
1070 

.3750 
.3996 
.3758 
.3895 
.3834 
.3838 

.1998 
.1913 
.2053 
.2026 
.2002 
.1903 

(1,5) 

(2,5) 
(3,5) 
(4,5) 
(5,5) 
(6,5) 

1875 
1819 
1836 
1878 
2035 
1998 

1933 
1923 

1899 
1796 
1926 
2155 

1158 
1127 
1149 
1202 
1258 
1123 

.3672 
.4045 
.3861 
.4033 
.3954 
.4002 

1985 
1984 
1908 
2039 
1928 
1934 

.1901 
.1829 
.2010 
.1953 
.1758 
.2350 

1877 
1967 
1935 
1979 
2050 
2041 

1152 
1138 
1220 
1276 
1139 
1151 

.3638 
.3714 
.3888 
.3722 
.4034 
.4295 

.2010 
.2021 
.1932 
.2264 
.1796 
.2293 

(1,6) 

(2,6) 
(3,6) 
(4,6) 
(5,6) 
(6,6) 

1907 
1895 
1826 
2121 
2730 
2423 

1933 
1953 
1853 
1795 

2017 
1999 

1160 
1150 
1205 
1069 
1061 
1344 

.3699 
.3788 
.3854 
.4051 
.4204 
.3880 

2011 
1920 
1845 
2000 
1784 
2129 

.1882 
.1747 
.1862 
.1924 
.2196 
.2236 

1963 
1984 
1933 
1969 

1873 
1995 

1166 
1122 
1122 
1168 
1361 
1116 

.3731 
.4195 
.3842 
.3915 

.3885 
.4004 

.1995 
.2094 
.1965 
.1956 
.1954 
.2490 

Notes:  MSE  are  the  numbers  in  the  table  x  10   4,  except  for  the  CD,  which  are  xlO 
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Table  5.  Out-of-Sample  Sign  Predictions  of  Feedforward  and  Recurrent  Networks. 


Network 
Models 

Percentages  of  Correct  Sign  Predictions 

Feedforward  Networks 

Recurrent  Networks 

CD 

DM 

JY 

PS 

SF 

CD 

DM 

JY 

PS 

SF 

(1.1) 

.56 

.64 

.60 

.72 

.66 

.56 

.62 

.60 

.72 

.64 

(2,1) 

.56 

.62 

.64 

.72 

.60 

.56 

.62 

.64 

.70 

.66 

(3,1) 

.56 

.54 

.60 

.72 

.56 

.56 

.62 

.62 

.54 

.62 

(4,1) 

.54 

.58 

.60 

.58 

.64 

.50 

.60 

.60 

.72 

.64 

(5,1) 

.56 

.60 

.52 

.64 

.68 

.56 

.58 

.52 

.46 

.54 

(6,1) 

.56 

.62 

.46 

.72 

.54 

.30 

.52 

.50 

.50 

.62 

(1,2) 

.50 

.62 

.50 

.72 

.66 

.56 

.62 

.66 

.70 

.66 

(2,2) 

.56 

.62 

.62 

.58 

.66 

.52 

.54 

.56 

.60 

.66 

(3,2) 

.58 

.58 

.50 

.42 

.60 

.46 

.56 

.62 

.66 

.58 

(4,2) 

.56 

.60 

.60 

.60 

.68 

.48 

.60 

.58 

.58 

.54 

(5,2) 

.26 

.52 

.50 

.64 

.54 

.54 

.58 

.58 

.68 

.58 

(6,2) 

.44 

.60 

.60 

.58 

.50 

.56 

.60 

.52 

.64 

.60 

(1,3) 

.54 

.60 

.62 

.64 

.66 

.56 

.66 

.60 

.60 

.54 

(2,3) 

.52 

.54 

.62 

.66 

.72 

.54 

.56 

.60 

.60 

.56 

(3,3) 

.54 

.56 

.62 

.56 

.62 

.52 

.60 

.64 

.68 

.66 

(4,3) 

.42 

.58 

.68 

.60 

.58 

.58 

.64 

.60 

.66 

.66 

(5,3) 

.52 

.58 

.60 

.64 

.56 

.56 

.62 

.62 

.60 

.56 

(6,3) 

.56 

.54 

.54 

.60 

.60 

.42 

.62 

.56 

.40 

.54 

(1,4) 

.54 

.60 

.62 

.58 

.60 

.50 

.64 

.66 

.72 

.68 

(2,4) 

.56 

.54 

.50 

.62 

.62 

.56 

.60 

.60 

.68 

.62 

(3,4) 

.52 

.64 

.62 

.60 

.66 

.52 

.60 

.62 

.70 

.46 

(4,4) 

.50 

.64 

.62 

.62 

.60 

.56 

.58 

.64 

.66 

.54 

(5,4) 

.46 

.64 

.56 

.50 

.56 

.48 

.52 

.64 

.64 

.60 

(6,4) 

.44 

.56 

.66 

.46 

.64 

.14 

.54 

.60 

.48 

.62 

(1,5) 

.58 

.58 

.62 

.68 

.66 

.54 

.68 

.66 

.64 

.58 

(2,5) 

.56 

.64 

.66 

.62 

.64 

.52 

.58 

.58 

.68 

.62 

(3,5) 

.52 

.60 

.64 

.62 

.64 

.48 

.62 

.58 

.68 

.64 

(4,5) 

.56 

.60 

.60 

.52 

.56 

.54 

.58 

.52 

.68 

.62 

(5,5) 

.52 

.64 

.44 

.48 

.64 

.60 

.54 

.56 

.62 

.62 

(6,5) 

.52 

.58 

.58 

.46 

.54 

.42 

.56 

.54 

.54 

.60 

(1,6) 

.52 

.58 

.66 

.62 

.62 

.56 

.66 

.58 

.72 

.62 

(2,6) 

.52 

.60 

.58 

.62 

.64 

.56 

.64 

.64 

.52 

.60 

(3,6) 

.52 

.58 

.60 

.52 

.68 

.50 

.64 

.58 

.60 

.64 

(4,6) 

.38 

.68 

.70 

.60 

.58 

.54 

.58 

.58 

.66 

.60 

(5,6) 

.36 

.66 

.70 

.50 

.62 

.38 

.62 

.50 

.54 

.64 

(6,6) 

.44 

.50 

.50 

.52 

.62 

.40 

.58 

.56 

.62 

.50 
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Table  6.  Out-of-Sample  Forecasting  Comparison:  Networks  vs.  ARMA  Models. 


Out- 
of- 

Sample 

Target 
ARMA 
Models 

Number 

of  Better  N 

etworks 

Feedforward  Network 

Recurrent  Network 

CD 

DM 

JY 

PS 

SF 

CD 

DM 

JY 

PS 

SF 

MSE 

(0,0) 

22 

34 

31 

23 

36 

23 

34 

33 

24 

33 

(1,0) 

19 

34 

26 

23 

36 

18 

34 

31 

27 

33 

(1,1) 

17 

34 

26 

26 

36 

17 

34 

31 

29 

33 

(2,2) 

20 

32 

28 

25 

34 

20 

32 

31 

28 

32 

Sign 

Coin 

28 

36 

34 

32 

36 

26 

36 

36 

33 

35 

(1,0) 

29 

36 

29 

22 

32 

30 

36 

34 

27 

29 

(1,1) 

29 

36 

29 

27 

28 

30 

36 

34 

28 

27 

(2,2) 

32 

34 

34 

26 

35 

31 

34 

36 

28 

34 

Notes:  Each  numbers  in  the  table  is  the  number  of  networks  (out  of  36  estimated  networks)  that 
predict  better  than  or  the  same  as  corresponding  target  models.  "Coin"  stands  for  50%  chance  of 
getting  correct  sign  prediction. 
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